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Abstract —Time series prediction is an active area of research 
in the machine learning community because of its unfathomable 
applications in practical domain, such as pattern recognition, 
stock market analysis, weather forecasting, intelligent transport 
and trajectory forecasting, earthquake prediction, EEG, and 
largely in any domain of applied science and engineering which 
involves temporal measurements. 

This paper applies several time forecasting algorithms to the 
data on annual changes in earth’s rotation, i.e. day length, and 
compares the result. 

I. Introduction 

A time series can be stated as a discrete sequence of a 
time-valued function measured over time. This collection of 
unprocessed data of time varying real world entities at fixed 
time-points, within a given finite interval of time together 
describes a time-series [1]. 

Time series analysis involves discovering the description of 
salient features of the series and the prediction of future values. 
A suitable mathematical model is fitted to a given time series 
and the corresponding parameters are estimated using known 
data values [2]. The selected mathematical model describes the 
underlying data generating process for the series and based on 
this model, future values are predicted. 

In this paper, we implement the ARIMA, ELM, SVR and 
Feed Forward Network algorithms to our dataset and compare 
the results while changing the parameters of respective algo¬ 
rithms. Rest of the report is organised as follows: Section II 
explains the importance of this project; Section III mentions 
and analyses the dataset used; Section IV, V, VI and VII 
discuss the implementation of ARIMA, SVR, Feed Forward 
Network and ELM algorithms respectively; Section VIII shows 
a comparison between the best results of above methods, and 
finally conclusion is given in Section IX. 

II. Importance 

Moon’s gravitational pull causes the ocean tides to rise and 
fall while Earth rotates beneath them. Angular momentum gets 
transferred from Earth to Moon, as a result of which Earth 
loses energy and Moon gains energy. Earth slows down due 
to loss in energy whereas the Moon’s distance from Earth 
increase due to energy gain by Moon [3]. 

The continuous change in rotation rate of Earth has sig¬ 
nificant geophysical effects on it [4]. It causes alteration in 
the day and night cycles. This action is gradually changing 


Earth’s shape from an oblate to a sphere. It has profound 
effects on Earth’s dynamic atmosphere like greater evaporation 
and strong global winds. It has caused earthquakes to occur 
and set up dynamic stress and pressures in the interior of the 
Earth. 

Due to tidal effects as well as the non-tidal effects like 
global warming, polar ice caps etc., it has become extremely 
difficult to predict this secular change in Earth’s rotation. 
Therefore, we, through this project, aim to develop a model 
to predict the future change in Earth’s rotation (or change in 
length of days) from the past year values based on various 
neural networks algorithms so that future calamities can be 
predicted and mitigated. 

III. Dataset 

We have used the data on ‘Annual Changes in the Earth’s 
Rotation, Day Length from 1821-1970”, available on the Time 
Series Data Library, provided by Andrews & Herzberg (1985) 
[3]. The dataset contains 149 data points with maximum and 
minimum values 421 x 10“ 5 and -347 x 10“ 5 respectively (from 
here on, all values of MSE discussed will be of the order 
10 -5 ). Graph of data is shown at the end of paper. 

Fig. 1 shows the correlation between eight consecutive 
values of data. It is seen that it decreases as we move farther 
away from the present value. Upto the past five values, we 
have a correlation of greater than 75%. So, we can expect 
the older values to contribute lesser to the prediction of future 
value. We shall verify this in further sections. 


ft-3 

t 

■3.5 93934 

rfc-7 

■3.649311 

rt- 6 

3.7-39939 

ft-5 

3.772564 

ft-4 

3.335434 

ft-3 

3.395342 

ft-2 

3.943417 

p:-l 

3.935445 

t 

•41 

JL ■ titt'b'Dt! 


Figure 1: Correlation Values of Data 


IV. ARIMA 

An observed time series can be thought of as a particular 
realization of a stochastic process, where ‘n’ previous obser¬ 
vations are used along with random variables to predict future 
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Figure 2: ARIMA: Varying Training to Testing Set Ratio: 0.6, 0.7, 0.9 ; (p,d,q) = (6,1,0) 





Figure 3: ARIMA: Varying p: 4, 5, 6; (d,q) = (1,1), Set Ratio = 0.7 





Figure 4: ARIMA: Varying (d,q): (1,0), (1,1), (2,1); (p) = (6), Set Ratio = 0.7 


values. Stochastic model can be divided into two categories - 
linear and non-linear. 

ARIMA or Auto Regressive Integrated Moving Average 
models are, in theory, the most general class of models for 
forecasting a time series which can be made to be “stationary” 
by differencing. It is a combination of Autoregressive(AR) and 
Moving Average (MA) models [5]. 

This assumes the future value to be linear combination of 
‘p’ past observations or “autoregressive” terms, ‘q’ past errors 
or “moving average” terms, ’d’ difference terms to stationarize 
the sequence or “integrated” terms and a random error with 
a constant term. An ARIMA(p,d,q) model is expressed in 
equation [6] as: 

v q 

d th order diff. eqn. on y t = c + e t + E ViVt-i + ^ @j £ t-j 

i=l j =1 

where c is a constant, e t is the random error, e t -j are past 
errors, ipi and 0j are random values to be optimised. 


Simulations on varying values of p, d, q and training to test 
dataset ratio were done and the results are shown in figures 2 
to 4. 

From Fig.2 we can see that too less or too high training to 
testing set ratio gives high mean error value. It was observed 
that a 0.7 ratio was optimal for our dataset. Fig. 3 shows the 
variation in MSE with variation in number of past values taken 
for prediction. MSE decreases as ’p’ is increased to 6, beyond 
which it increases. This is in accordance with the discussion 
in section II, where we saw that correlation is above 70% till 
’t-6’ while that of ’t-8’ with ’t’ is very low. Hence, it does not 
contribute significantly to the prediction. In Fig. 4, increasing 
’q’ value from 0 to 1 decreases MSE, while increasing ’d’ 
drastically worsens MSE. 

With this prediction model, most optimal output is obtained 
at: 

• Training to Testing Set Ratio = 0.7 
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Figure 9: Fletcher-Powell CG 


Figure 10: Gradient Descent with 
Momentum 


Figure 11: LM 


Figure 12: Scaled Conjugate 
Gradient 


• (p,d,q) = (6,1,1) 

. MSE = 214.8 

V. Feed Forward Neural Network 

Artificial neural networks play a vital role in learning the dy¬ 
namic behavior of a time-series. Fearning provided by ANNs 
is advantageous to traditional Feast Minimum Square (FMS) 
approach used in curve fitting, as FMS rule presumes a fixed 
functional form whereas as neural nets allow using variable 
functional form which offers freedom to autonomously adapt 
its parameters in the required direction to appropriately tune 
the functional form of the network and produce the desired 
output for known inputs. Moreover, ANN allow non-linear 
modelling of the time series. 

The most common model in the domain of ANN is the 
Multilayer Perceptron (MFP). MFPs contain at least three 
layers- input, hidden and output. There may be more than 
one hidden layer, with nodes in various layers known as 
processing units. Various optimization algorithms exist for 
calulating optimal parameters of the network. We have applied 
eight such methods whose results are shown in figures 5 to 
12. These methods and their corresponding MSE’s obtained 
are as follows: 

1) Resilient Backpropagation: 125.669 

2) Bayesian Regularisation: 81.977 

3) BFGS Quasi-Newton: 96.039 

4) Conjugate Gradient with Powell/Beale Restarts: 106.454 

5) Fletcher-Powell Conjugate Gradient: 109.681 

6) Gradient Descent with Momentum: 935.758 

7) Fevenberg-Marquardt: 82.654 

8) Scaled Conjugate Gradient: 87.365 

It is seen that use Bayesian Regularisation optimization 
algorithm gives the best results, whereas gradient descent with 
momentum leads to the worst. 


VI. Support Vector Regression 

Support Vector Machines (SVM), developed by Vapnik et 
al. [7] in 1995, are used for many machine learning tasks 
such as pattern recognition, object classification, and in the 
case of time series prediction, regression analysis. It is based 
on statistical learning theory, or VC (Vapnik-Chervonenkis) 
Theory. 

Support Vector Regression (SVR) relies on defining the 
loss function that ignores errors, which are situated within 
the certain distance of the true value. This type of function is 
often called e-insensitive loss function [8]. The cost of errors 
is zero for all points that are inside the band. 

While using SVR, we varied the following parameters: 
Training to testing data set ratio, farthest past value used, and 
e. Results are shown in figures 13 to 15. 

Again, it can be seen that varying the data ratio leads to 
variation in MSE. Also, taking past values much farther 
from present value leads to worsening of MSE due to low 
correlation. This is in accordance to the discussion in Section 
II and III. It is seen that increasing e leads to increase in MSE. 
This is because high e would mean that error cost would be 
zero in a larger band around the optimal curve, and thus the 
machine would accept and hence predict values with larger 
deviations from ideal. 

VII. Extreme Fearning Machines 

The EFM algorithm, proposed by GB Huang, uses Single 
Fayer Feedforward Neural Networks (SFFN). EFMs are fast 
and computationally inexpensive. The main reason for this lies 
in the random initialization of the input weight vector. As such, 
hidden layer weights can be directly obtained by calculating 
inverse (or pseudoinverse) of the input weight matrix. This 
eliminates the need of optimising both layers’ weights using 
least square or other similar methods. 



























4 





Figure 13: SVR: Varying Training to Testing Set Ratio: 0.7, 0.8, 0.9 ; Max Past = 5, e = 0.015 





Figure 14: SVR: Varying Max Past: 4,5,8 ; Set Ratio = 0.7, e = 0.015 





Figure 15: SVR: Varying e: 0.015, 5.015 


Prediction using ELM 



Future Year Values 


Prediction using ELM 



Future Year Values 


50.005; Max Past = 4, Set Ratio = 0.7 


Prediction using ELM 



Future Year Values 


Prediction using ELM 



Future Year Values 


Figure 16: HI = 13 


Figure 17: H2 = 13 


Figure 18: H3 = 13 


Figure 19: H4 = 13 


Prediction using ELM 



Future Year Values 


Prediction using ELM 



Future Year Values 


Prediction using ELM 



Future Year Values 


Prediction using ELM 



Future Year Values 


Figure 20: H3 = 10 


Figure 21: H3 = 12 


Figure 22: H3 = 13 


Figure 23: H3 = 15 
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Optimally-Pruned Extreme Learning Machine (OP-ELM), 
based on the original ELM, is used for long-term time series 
prediction. This algorithm decreases the computational time 
required for training and model structure selection of the net¬ 
work by a large extent. Moreover, the algorithm is simplistic 
with easy implementation technique. 

During simulations with ELM methodology, two parameters 
were changed: number of hidden layers and number of neurons 
in them. The following combinations were tried: 

• Hidden Layer 1 (HI) = 8, 10, 13 neurons 

. HI = 12, H2 = 13, 16, 18 

. HI = 12, H2 = 16, H3 = 10, 12, 13, 15, 18 

. HI = 12, H2 = 16, H3 = 15, H4 = 13, 15, 16 

Some results are shown in figures 16 to 23. 

VIII. Comparison of Methods Applied 

It is observed that the feed forward neural network approach 
gives the lowest mean squared errors as compared to the other 
three approaches for our dataset. Barring the gradient descent 
with momentum optimization rule, all other rules gave MSE 
around 100. 

SVR and ARIMA techniques produce almost similar results 
and the general trend reflected is that prediction becomes better 
as the number of past values taken is increased and data set 
ratio is decreased upto a limit but then worsens, showing an 
inverse trend with these two parameters. 

ELM algorithm gave the worst results, with very high error 
and almost no correlation with test data, owing to the fact 
that first layer weights are randomly initialised and remain 
unoptimised during the whole process. 

IX. Conclusion 

Time series prediction is a hot area of research because of 
the fact that it pops up in various domains of practical life 
quite frequently. Its large prospect of applications makes it an 
indispensable tool. One such application as been successfully 
used in this paper depicting the ability of neural networks 
to learn complex, seemingly non linear patterns from data 
and extrapolating it to predict future values. Selection of an 
appropriate model that can produce accurate forecast based on 
a description of historical pattern in the data and determining 
the optimal model orders is vital. Hence model identification, 
parameter estimation and diagnostic checking are crucial to 
minimizing prediction error. 

The work can be further extended by applying several 
other prediction algorithms such as RBF, ARCH, GARCH, 
genetic algorithms and many more. Prediction of non-linear 
series and improving accuracy are the main challenges that 
are continuously addressed by researchers. 
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Figure 24: Graph of Dataset, with years on x-axis 
and change in day length (xlO -5 ) on y-axis 


